Recovering Latent Information in Treebanks
نویسندگان
چکیده
Many recent statistical parsers rely on a preprocessing step which uses hand-written, corpus-specific rules to augment the training data with extra information. For example, head-finding rules are used to augment node labels with lexical heads. In this paper, we provide machinery to reduce the amount of human effort needed to adapt existing models to new corpora: first, we propose a flexible notation for specifying these rules that would allow them to be shared by different models; second, we report on an experiment to see whether we can use ExpectationMaximization to automatically fine-tune a set of hand-written rules to a particular corpus.
منابع مشابه
Parsing German with Latent Variable Grammars
We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any la...
متن کاملUsing multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals
BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...
متن کاملThe Xavier Module – Information Processing of Treebanks
This paper aims to introduce the Xavier module, a program package to process Treebanks (in particular, the Sejong Korean Treebank). In this paper, the procedure of implementing Xavier is discussed, and main usage of the program is also provided. Though this paper focuses on the Sejong Korean Treebank, Xavier is also applicable to other Treebanks, such as the Penn Treebanks, because it has been ...
متن کاملLatent Semantic Clustering of German Verbs with Treebank Data
Treebank data have been utilized as data sources for a wide range of tasks in computational linguistics, including statistical parsing, anaphora resolution, induction of valence lexica, etc. More recently, researchers have experimented with extracting semantic information from syntactically annotated data. Here, treebank data have been used for the purposes of identifying selectional preference...
متن کاملReconstructing Requirements Traceability in Design and Test Using Latent Semantic Indexing
Managing traceability data is an important aspect of the software development process. In this paper we define a methodology, consisting of six steps, for reconstructing requirements views using traceability data. One of the steps concerns the reconstruction of the traceability data. We investigate to what extent Latent Semantic Indexing (LSI), an information retrieval technique, can help recov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002